Emmanuel Macron

C'est marrant à quel point l'iconographie du site d'Emmanuel est soignée. Peut-on en faire une gallerie d'images?


In [2]:
from bs4 import BeautifulSoup
import requests

In [3]:
r = requests.get('https://en-marche.fr/emmanuel-macron/le-programme')

In [4]:
soup = BeautifulSoup(r.text, 'html.parser')

In [5]:
proposals = soup.find_all(class_='programme__proposal')

In [6]:
proposals = [p for p in proposals if 'programme__proposal--category' not in p.attrs['class']]

In [7]:
len(proposals)


Out[7]:
36

In [8]:
p = proposals[0]

In [9]:
full_url = 'https://en-marche.fr' + p.find('a').attrs['href']
full_url


Out[9]:
'https://en-marche.fr/emmanuel-macron/le-programme/action-publique-fonction-publique'

In [10]:
full_urls = ['https://en-marche.fr' + p.find('a').attrs['href'] for p in proposals]

In [11]:
full_urls[:10]


Out[11]:
['https://en-marche.fr/emmanuel-macron/le-programme/action-publique-fonction-publique',
 'https://en-marche.fr/emmanuel-macron/le-programme/agriculture',
 'https://en-marche.fr/emmanuel-macron/le-programme/culture',
 'https://en-marche.fr/emmanuel-macron/le-programme/defense',
 'https://en-marche.fr/emmanuel-macron/le-programme/dependance',
 'https://en-marche.fr/emmanuel-macron/le-programme/dialogue-social',
 'https://en-marche.fr/emmanuel-macron/le-programme/education',
 'https://en-marche.fr/emmanuel-macron/le-programme/%C3%A9galit%C3%A9-hommes-et-femmes',
 'https://en-marche.fr/emmanuel-macron/le-programme/emploi-ch%C3%B4mage-securites-professionnelles',
 'https://en-marche.fr/emmanuel-macron/le-programme/enseignement-superieur-recherche']

In [12]:
r = requests.get(full_url)
soup = BeautifulSoup(r.text, 'html.parser')

In [13]:
figure_tag = soup.find('figure', class_='fullscreen')
figure_tag


Out[13]:
<figure class="fullscreen">
<img alt="01-fonction-publique-hospital-sante-emmanuel-macron-en-marche" src="/assets/images/01-fonction-publique-hospital-sante-emmanuel-macron-en-marche?q=70&amp;cache=e7d04db7e6ec8a188aee&amp;fm=pjpg&amp;s=97b9c84c57c417dcef72c4919e6f2625" title="Action publique / Fonction publique"/>
</figure>

On peut maintenant extraire le lien vers l'image.


In [14]:
src_url = 'https://en-marche.fr' + figure_tag('img')[0].attrs['src']
src_url


Out[14]:
'https://en-marche.fr/assets/images/01-fonction-publique-hospital-sante-emmanuel-macron-en-marche?q=70&cache=e7d04db7e6ec8a188aee&fm=pjpg&s=97b9c84c57c417dcef72c4919e6f2625'

On peut afficher ceci dans le notebook.


In [15]:
from IPython.display import Image

In [16]:
Image(url=src_url)


Out[16]:

In [17]:
def extract_img_src(url):
    "Extracts image src url from linked page."
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    figure_tag = soup.find('figure', class_='fullscreen')
    if figure_tag is not None and figure_tag('img') is not None:
        src_url = 'https://en-marche.fr' + figure_tag('img')[0].attrs['src']
        return src_url
    else:
        print("no image for url: {}".format(url))
        return None

On peut répeter ce processus et faire une gallerie avec toutes ces images.


In [18]:
srcs = [extract_img_src(url) for url in full_urls]


no image for url: https://en-marche.fr/emmanuel-macron/le-programme/familles-et-societe
no image for url: https://en-marche.fr/emmanuel-macron/le-programme/handicap
no image for url: https://en-marche.fr/emmanuel-macron/le-programme/immigration-et-asile
no image for url: https://en-marche.fr/emmanuel-macron/le-programme/justice
no image for url: https://en-marche.fr/emmanuel-macron/le-programme/pauvrete

In [19]:
srcs = [_ for _ in srcs if _ is not None]

In [20]:
header = """<!doctype html>
<html lang="fr">
<head>
  <meta charset="utf-8">
  <title>Gallerie des photos du site d'Emmanuel Macron</title>
  <style>
  img {width: 100%;}
  </style>
</head>"""

In [22]:
def format_as_img_tag(src):
    return "<img src={} />".format(src)

In [23]:
format_as_img_tag(srcs[2])


Out[23]:
'<img src=https://en-marche.fr/assets/images/04-culture-musee-exposition-guadeloupe-emmanuel-macron?q=70&cache=0f9e2f1675c10ef5c67b&fm=pjpg&s=92575e4c67cd6a07095acfd08652efb6 />'

In [24]:
with open('galerie_macron.html', 'w') as f:
    body = """<body>
{0}
</body>""".format("\n".join(format_as_img_tag(url) for url in srcs))
    html = header + body + "</html>"
    f.write(html)

Ce sont des belles photos...

François Fillon

Depuis la sortie du programme de François Fillon, on peut répéter la démarche.


In [35]:
r = requests.get('https://www.fillon2017.fr/projet/')
soup = BeautifulSoup(r.text, 'html.parser')

In [36]:
tags = soup.find_all('a', class_='projectItem__inner')

In [37]:
sublinks = [tag.attrs['href'] for tag in tags]

On s'attaque aux pages individuelles.


In [39]:
sublinks[0]


Out[39]:
'https://www.fillon2017.fr/projet/competitivite/'

In [38]:
r = requests.get(sublinks[0])
soup = BeautifulSoup(r.text, 'html.parser')

In [48]:
src = soup.find('div', class_='singleProject__banner bannerWithMask backgroundCover').attrs['style'].split("background-image: url(")[1][1:-3]

In [49]:
def extract_img_src(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser') 
    src = soup.find('div', class_='singleProject__banner bannerWithMask backgroundCover').attrs['style'].split("background-image: url(")[1][1:-3]
    return src

In [51]:
srcs = [extract_img_src(url) for url in sublinks]

In [52]:
srcs


Out[52]:
['https://www.fillon2017.fr/wp-content/uploads/2016/01/DSCF7108.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/FRANCOIS_FILLON_LIMOUSIN_0558-1024x457.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/allocation_sociale_unique-1024x509.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2016/01/DSCF5325.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_0849-1024x478.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_7587-1024x416.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/DSCF7085-1024x462.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/fonction_publique-1024x451.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2016/06/DSC_1847.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_1579-1024x470.png',
 'https://www.fillon2017.fr/wp-content/uploads/2016/05/femmes-1-1024x314.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/DSCF7280-1024x411.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/1234432974.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/defense-1024x471.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_1232-1024x523.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/enseignement_recherche-1024x474.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/FRANCOIS_FILLON_LIMOUSIN_0327-1-1024x458.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/chasse_ff-1024x530.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/DSCF7239-1024x420.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_0926-1024x465.png',
 'https://www.fillon2017.fr/wp-content/uploads/2016/03/DSCF5040.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/pouv_achat-1024x494.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_1341-1024x519.png',
 'https://www.fillon2017.fr/wp-content/uploads/2016/04/IMG_0353.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/reforme_etat-1024x388.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_8849-1024x576.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_1135-1024x445.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/image1-1024x444.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/FD_2439_2-1024x411.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/13062320_10154143009027533_3114262831415445150_n.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2016/11/Capture-d’écran-2017-02-23-à-00.05.53.png',
 'https://www.fillon2017.fr/wp-content/uploads/2016/01/Capture-d’écran-2017-02-23-à-10.29.24.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_1742-1024x456.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_0952-1024x434.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_9977-1024x450.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2015/11/IMG_8838.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/politique_ville-1024x478.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/DSCF5048-1024x390.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/etranger-1024x427.jpg']

In [53]:
with open('galerie_fillon.html', 'w') as f:
    body = """<body>
{0}
</body>""".format("\n".join(format_as_img_tag(url) for url in srcs))
    html = header + body + "</html>"
    f.write(html)

In [ ]: